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Abstract 

Anopheles gambiae M and S are thought to be undergoing ecological speciation by adapting to different larval habitats. Toward an 
improved understanding of the genetic determinants and evolutionary processes shaping their divergence, we used a 400,000 
single-nucleotide polymorphism (SNP) genotyping array to characterize patterns of genomic differentiation between four geograph- 
ically paired M and S population samples from West and Central Africa. In keeping with recent studies based on more limited genomic 
or geographic sampling, divergence was not confined to a few isolated "speciation islands. " Divergence was both widespread across 
the genome and heterogeneous. Moreover, we find consistent patterns of genomic divergence across sampling sites and mutually 
exclusive clustering of M and S populations using genetic distances based on all 400,000 SNPs, implying that M and S are evolving 
collectively across the study area. Nevertheless, the clustering of local M and S populations using genetic distances based on SNPs from 
genomic regions of low differentiation is consistent with recent gene flow and introgression. To account for these data and reconcile 
apparent paradoxes in reported patterns of M-S genomic divergence and hybridization, we propose that extrinsic ecologically based 
postmating barriers vary in strength as environmental conditions fluctuate or change. 
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Introduction 

A fundamental problem in evolutionary biology is to under- 
stand the genomic architecture associated with adaptive 
population divergence and reproductive isolation. Scans of 
genome-wide patterns of differentiation between populations 



can be used to map candidate regions contributing to isola- 
tion, especially when diversification is occurring despite some 
ongoing gene flow (Via 2009; Nosil and Feder 2012). In the 
presence of gene flow, only genome regions under strong 
divergent natural selection are expected to resist introgression, 
resulting in a heterogeneous pattern of differentiated and 
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homogeneous regions (Wu 2001). Genome-wide scans of dif- 
ferentiation between incompletely reproductively isolated 
populations have largely revealed the predicted heteroge- 
neous pattern (Nosil et al. 2009; Nosil and Feder 2012). 
However, interpretation of genome scans is fraught with dif- 
ficulties (Noor and Bennett 2009; Turner and Hahn 201 0), and 
unresolved questions remain concerning the impact of gene 
flow on the number, size, and genomic distribution of excep- 
tionally diverged regions during the speciation process. 

At the heart of the debate is the role that genome structure 
plays in facilitating population divergence in the presence of 
gene flow. Two central issues concern: 1) the importance of 
physical features of the genome that reduce recombination 
(e.g., chromosomal rearrangements and centromeres) in fos- 
tering divergence and 2) whether divergence is predicated on 
a small number of genes clustered in a few genomic "islands 
of speciation" (Turner et al. 2005) or on a larger number 
of genes arrayed across the genome. Empirical studies have 
revealed conflicting patterns (Nosil et al. 2009). It is un- 
clear whether these differing outcomes are the result of: 
1) truly different evolutionary mechanisms acting among 
taxa, 2) comparisons involving disparate stages along the 
divergence continuum, 3) studies employing different sam- 
pling schemes with varying levels of resolution and sensitivity 
for detecting divergence, or 4) some combination of the 
above. 

The M and S forms of the African malaria mosquito 
Anopheles gambiae (Diptera: Culicidae) (della Torre et al. 
2001) represent a model system for the study of ecological 
speciation (Rundle and Nosil 2005; Schluter 2009). These mos- 
quitoes are thought to be diverging despite incomplete repro- 
ductive isolation due to differential ecological adaptation to 
alternative larval habitats (Diabate et al. 2008; Gimonneau 
et al. 2012a; Kamdem et al. 2012; Lehmann et al. 1997) in 
different eco-geographical settings (Caputo et al. 2008; 
Costantini et al. 2009; Simard et al. 2009). The M form 
mainly exploits more temporally stable larval habitats contain- 
ing relatively high levels of environmental stressors; in the sa- 
vannas of West Africa, M larval habitats tend to be rice fields 
with high predator density, whereas in the Central African 
rainforest, these may be polluted urban dumping grounds. 
Conversely, the S form exploits ephemeral habitats associated 
with seasonal rainfall that are largely unpolluted and predator 
free. Despite the difference in larval habitat, M and S overlap 
extensively and apparently continuously across much of West 
and Central Africa, where they may be strictly sympatric and 
synchronously breeding (della Torre et al. 2005). Premating 
reproductive barriers (Diabate et al. 2009; Pennetier et al. 
2010; Sanford et al. 201 1) generally limit heterotypic matings 
between local M and S populations to ~1% across most of 
their range of overlap (Tripet et al. 2001; della Torre et al. 
2005). The exception is in the western extreme of their 
West African range, where hybridization rates as high as 
20% have been recorded (Caputo et al. 2008, 2011; 



Oliveira et al. 2008). No intrinsic postmating barriers have 
been detected between M and S (Diabate et al. 2007; Hahn 
etal. 2012). However, extrinsic, environment-based postmat- 
ing barriers are presumed to be strong and to act against 
maladapted hybrids in the alternate M versus S larval habitats 
(Lehmann and Diabate 2008; Turner and Hahn 2010). 
Although pre- and postmating reproductive isolation may 
reduce effective gene flow significantly through most of 
West and Central Africa, neither barrier is absolute. The rea- 
lized rate of gene flow between M and S is not known (Turner 
and Hahn 2010; White et al. 2010; Hahn et al. 2012), but 
several lines of evidence indicate that at least some contem- 
porary gene flow is occurring. Single-nucleotide polymorph- 
ism (SNP) genotyping of individual mosquitoes from Guinea 
Bissau and Ghana has identified individuals of mixed ancestry 
(Marsden et al. 201 1; Weetman et al. 2012) and genes con- 
ferring resistance to insecticides and/or pathogens, as well 
as other apparently neutral sequences, have introgressed 
between M and S (Djogbenou et al. 2008; Etang et al. 
2009; White et al. 2011; Choi and Townson 2012). 
However, as we discuss later, the extent of introgression is 
still an open question at the center of ongoing debate that 
has important ramifications for understanding M-S 
divergence. 

The M and S forms of A. gambiae have played a key role in 
the development of current concepts about speciation gen- 
omics. The metaphor of "speciation islands" was originally 
developed by Turner et al. (2005) to describe the pattern of 
genomic differentiation detected in the first microarray-based 
genome scan of M versus S divergence. Interrogating the M 
and S genomes with ~1 42,000 unique probes from predicted 
protein-coding genes, only three small regions of significant 
M-S differentiation were observed. Two of these regions were 
adjacent to centromeres on chromosomes 2 and X, and the 
third was located on chromosome 2R. Collectively, the differ- 
entiated regions represented <2.8Mb of the total 260 Mb 
A. gambiae genome (~1%) and contained only 67 of the 
12,670 annotated protein coding genes (0.5%). Interpreted 
under the prevailing assumption of substantial introgressive 
hybridization between M and S, these results suggested that 
the genes responsible for ecological and reproductive isolation 
were few in number and mainly confined to three isolated 
"speciation islands," the largest two of which coincided with 
low recombination regions near centromeres on chromosome 
2 and the X. Low differentiation across the remainder of 
the genome was attributed to the homogenizing effects 
of gene flow. 

Subsequent studies cast doubt on the widely held assump- 
tion of substantial introgressive hybridization between M and 
S. Following the microarray-based approach of Turner et al. 
(2005), White et al. (2010) documented three regions of ele- 
vated M-S divergence adjacent to the centromeres on all 
three chromosomes comprising the A. gambiae complement 
(centromeric divergence on chromosome 3 had been 
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obscured in the original study due to incomplete genome 
assembly in that region). White et al. (2010) also found 
near-complete association of form-specific alleles in these un- 
linked centromeric regions within both forms, demonstrating 
strong linkage disequilibrium between M genotypes (and 
S genotypes) across a study area including Mali, Burkina 
Faso, and Cameroon. Hahn et al. (2012) argued that in the 
absence of evidence for biased cotransmission of form-specific 
allelic combinations (e.g., centromeric drive) on these other- 
wise independently assorting chromosomes, such strong link- 
age disequilibrium would be unlikely if levels of introgression 
were qualitatively similar to the ~1% hybridization rate 
(Turner and Hahn 2010). They, therefore, posed an alternative 
minimal gene flow hypothesis to explain the heterogeneous 
pattern of M-S genome divergence. Under this hypothesis, 
low divergence is largely due to shared ancestral polymorph- 
ism resulting from incomplete lineage sorting. Moreover, 
the high divergence centromere-proximal regions are not 
necessarily "speciation islands." Rather, they may contain ad- 
vantageous alleles that arose and swept to high frequency 
separately in M and S populations but need not be responsible 
for ecological or reproductive isolation between the two taxa 
(Noor and Bennett 2009; Turner and Hahn 201 0; White et al. 
201 0; Hahn et al. 201 2). Resolving the extent of introgression 
between M and S is therefore critical for interpreting the 
meaning of patterns of genomic divergence between these 
two mosquitoes. 

Questions have also arisen concerning the original charac- 
terizations of genomic islands of speciation in A gambiae. The 
apparent concentration of M-S divergence in isolated regions 
of reduced recombination could be an artifact of: 1) the rela- 
tively low-resolution gene-based approach used in initial mos- 
quito genome scan studies and/or 2) inadequate sampling 
designs in which only five individuals of each taxon were 
hybridized to arrays, effectively constraining detectable diver- 
gence to regions of fixed or highly skewed frequency 
differences. More recent higher resolution genomic scans of 
M-S divergence in Mali based on genome resequencing 
(Lawniczak et al. 2010) and genotyping of more than 
400,000 SNPs (Neafsey et al. 2010) have revealed additional 
regions of divergence dispersed along all chromosome arms in 
the genome that were missed by the original gene-based 
arrays. These studies and other lower density SNP genotyping 
surveys of natural M and S populations from Ghana, 
Cameroon, and Guinea Bissau (Weetman et al. 2010, 2012) 
demonstrate that divergence is more widespread and exten- 
sive than previously appreciated. 

The changing view of the structure of genomic differenti- 
ation between M and S calls for more detailed and standar- 
dized studies to firmly resolve the nature, as well as 
consistency, of divergence across the range of overlap of 
these mosquitoes and an assessment of the extent to which 
the pattern has been affected by gene flow. The former task 
has been primarily limited by methodology and adequate 



sampling, which can be rectified. However, assessing the 
impact of gene flow is more difficult. One approach adopted 
by two recent studies has been to investigate population gen- 
omic divergence in an area of unusually high hybridization 
rates (-20%) in Guinea Bissau (Oliveira et al. 2008). Based 
on detection of admixed individuals, these studies showed 
that the level of introgression in Guinea Bissau was qualita- 
tively consistent with the high rate of hybridization, suggesting 
that hybrids might not be as strongly selected against at this 
locality as they are presumed to be elsewhere in Africa 
(Marsden et al. 2011; Weetman et al. 2012). However, the 
apparently high level of introgression was asymmetric (largely 
from M to S), and mating was still not panmictic in Guinea 
Bissau (Caputo et al. 2011; Marsden et al. 2011; Weetman 
et al. 2012). Moreover, 15 outlier SNPs (mostly near centro- 
meres but one on 3R in an area of normal recombination) 
were found to display consistent M-S divergence in Guinea 
Bissau and other locations in West and Central Africa, sug- 
gesting that these regions are being maintained by 
form-related divergent selection countering gene flow 
(Weetman et al. 2012). 

Guinea Bissau, however, may represent an anomalous area 
where M and S have recently come into secondary contact 
(Caputo et al. 201 1), and it is possible that changing environ- 
mental conditions have reduced the strength of ecologically 
based postmating barriers. If so, studying M-S divergence in 
Guinea Bissau and other areas of atypically high hybridization 
where ecological and reproductive isolation may be breaking 
down (Caputo et al. 201 1 ; Weetman et al. 201 2) is unlikely to 
be broadly instructive about genome regions that contribute 
to M-S isolation under conditions more typical of the range of 
overlap between these two mosquitoes in West and Central 
Africa. 

Accordingly, here we characterize M-S divergence in mul- 
tiple geographic regions where recorded hybridization rates 
are more typical (0-1%; della Torre et al. 2005; Costantini 
et al. 2009; Simard et al. 2009). Although the relative rarity 
of M-S hybridization compared with Guinea Bissau can make 
it difficult to sample F1 hybrids in these areas, a 1% rate of 
interbreeding is not trivial in population genetic terms and 
would be sufficient to homogenize between-form variation 
unless countered by strong selection (Slatkin 1987). The only 
previous study to have mapped genomic divergence between 
natural population samples of M and S at very high resolution 
(more than 400,000 markers) studied only one locality in Mali 
(Neafsey et al. 2010). We extend that study by assessing the 
nature and consistency of patterns of M-S divergence and 
testing for evidence of gene flow through genome-wide 
scans of four replicated pairs of local M and S populations 
from West and Central Africa, using the same custom 400K 
SNP genotyping array (Neafsey et al. 2010). We report results 
confirming widespread and heterogeneous yet consistent 
genomic differentiation between M and S mosquitoes 
across West and Central Africa. We also found evidence for 
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recent introgression between local M and S demes. Toward a 
reconciliation of the apparent paradox between strong gen- 
etic associations of form-specific alleles across the genome 
and credible evidence of ongoing introgressive hybridization, 
we propose that extrinsic ecologically based postmating bar- 
riers vary in strength as a function of fluctuating or changing 
environmental conditions. 

Materials and Methods 

Mosquito Sampling and Identification 

Mosquitoes were collected as indoor resting or host-seeking 
adults from the five localities numbered in figure 1 . Those 
localities are 1) the southern Malian villages given in Neafsey 
et al. (2010); the Burkina Faso villages of 2) Samandeni 
(11°27'N, 04°27'W), and 3) Monemtenga (12°06'N, 
01°17 r W), sampled in 2005; and the Cameroonian villages 
of 4) Campo (02°22'N, 09°49'E), and 5) Bibouleman 
(02°52'N, 11°15'E) f sampled in 2005. The first three localities 
were sites where M and S mosquitoes were collected to- 
gether, whereas sites 4 and 5 represented localities where 
M and S, respectively, were collected in geographic proximity 
(distance of 169 km separating populations). Morphological 
analysis was first performed to identify collected mosquitoes 
as belonging to the A gambiae sibling species complex; only 
female mosquitoes were analyzed further. Molecular deter- 
mination of A. gambiae M or S form was based on a diagnos- 
tic polymerase chain reaction-RFLP molecular assay targeting 
the ribosomal DNA repeat on the X chromosome 
(Santolamazza et al. 2004). Karyotyping of chromosome 2 
inversions was not performed for these samples, but the 
known physical location of chromosomal rearrangements 
that commonly segregate in both forms (all but one are 
shared polymorphisms; della Torre et al. 2005) is indicated 
in figure 2. 




Fig. 1. — Sampling locations in Africa. Localities sampled are repre- 
sented by numbered circles. Corresponding village names and geographic 
coordinates are given in Materials and Methods. 



Array-Based Genotyping 

DNA was extracted from individual female mosquitoes using 
the DNeasy kit (QIAGEN). The quantity and concentration of 
DNA in each extraction was determined on a SpectraMax 
M2 microplate reader (Molecular Devices) using Quant-iT 
PicoGreen dsDNA reagents (Invitrogen). Separate pools of M 
and S genomic DNA were made from 20 female mosquitoes 
of each form for all four of the paired collecting sites. A pre- 
vious study using the A. gambiae 400K SNP array demon- 
strated a very high correlation between results from 
aggregate individual hybridizations and pooled hybridizations 
of 20 individuals (Pearson's ^ = 0.96 [Neafsey et al. 2010]). 
We therefore adopted the pooled assay design for economy. 
Each of 20 individuals contributed 250 ng of DNA to their 
respective pool. The pooled mosquito DNA samples were pro- 
cessed following a modified version of the standard human 
500K Affymetrix array protocol as previously described in 
Neafsey et al. (201 0). When the DNA quantity from individual 
mosquitoes was insufficient (locality 2 for both M and S sam- 
ples; fig. 1), whole-genome amplification was performed be- 
fore pooling using the REPLI-g Mini kit (QIAGEN). Pooled 
samples were hybridized to a custom Affymetrix SNP geno- 
typing array using the methods described in Neafsey et al. 
(2010). 

Briefly, the SNP array assays variable SNPs in the A. gambiae 
genome identified through capillary sequencing of M and 
S colonies from Mali (Lawniczak et al. 2010). A subset of 
400,151 SNPs was chosen for inclusion on the array based 
primarily on uniformity of SNP coverage across the genome 
and conformity with design parameters of Affymetrix. (As 
80 of the 400,151 SNPs on the array are Y-linked, these 
were not analyzed in this study given that samples consist of 
only female mosquitoes; the analysis, therefore, included 
400,071 SNPs on the X and autosomes.) SNPs within 15 bp 
of each other or insertion/deletions were excluded from the 
array. The median distance between assayed SNPs is ~300 bp, 
and 98% of annotated genes contain at least one assayed 
SNP. A total of 59,878 SNPs occur in coding sequence. It is 
important to note that because the SNPs on the genotyping 
array were identified from M and S A. gambiae colonies 
established from Mali, SNPs may not be geographically repre- 
sentative, and, thus, our data may suffer from a degree of 
ascertainment bias. As the failure to detect hidden allelic vari- 
ation is unlikely to result in false signals of genomic diver- 
gence, results we report about the shared architecture of 
M-S genomic divergence across collecting sites should be 
robust to ascertainment bias. However, one possible manifest- 
ation of ascertainment bias would be the underestimation of 
M-S divergence (and within-taxon divergence) in populations 
geographically distant from Mali. The extent to which this 
underestimation may have affected our results is unknown 
in the absence of resequencing data from the relevant geo- 
graphic locations. 
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Raw hybridization data were subject to robust multi-array 
analysis background correction and quantile normalization 
using Affymetrix Power Tools. Following Neafsey et al. 
(2010), we used the Contrast statistic generated by 
BRLMM-P (Rabbee and Speed 2006) to compare hybridization 
results between pooled samples. Briefly, the Contrast statistic 
is a measure of the relative hybridization signal intensity of 
alternative alleles (5 a and S b ) in a pool, given by the expression 
(5 a - 5 b )/(5 a + 5 b ) when the correction coefficient (k) = 1 . 
Contrast value data for this study are deposited in 
VectorBase (www.vectorbase.org, last accessed November 
20, 2012). 

Outlier Window Analysis 

To assess large-scale patterns of genomic differentiation, we 
calculated the mean difference in Contrast values between 
pools of M and S mosquitoes at each paired site in non- 
overlapping windows of 50 adjacent SNPs. This resulted in a 
total of 7,998 windows being analyzed across the genome: 
4,102 on chromosome 2; 3,066 on chromosome 3; and 830 
on the X. The mean physical distance spanned by 50 adjacent 
SNPs was 28.2 kb (range = 9.1 -280 kb, standard devi- 
ation = 16.6 kb). Individual windows were tested for signifi- 
cant differences in Contrast values between M and S at each 
collecting site by nonparametric, Monte Carlo bootstrapping 
to generate an expected probability distribution to compare to 
observed values. A total of 10 million iterations were per- 
formed to generate the expected distribution by randomly 
resampling 50 individual SNP assays from across the 
genome. Separate analyses were conducted for the X 
chromosome and the autosomes. We used a significance 
threshold alpha value of 0.05 adjusted for the number of 
windows examined by Bonferroni correction to identify indi- 
vidual windows that diverged substantially from expectation. 

Genetic Distance Trees 

We calculated pairwise distances between each population 
pair as the mean of the absolute value of Contrast value dif- 
ferences at each SNP. On the basis of a matrix of these mean 
Contrast value differences, we constructed neighbor-joining 
trees using the APE package in R (Paradis et al. 2004). 
Bootstrapping (1,000 replicates) was performed on the trees 
using a custom R script; bootstrap support was summarized 
using CONSENSE within PHYLIP (Felsenstein 2004). The trees 
were used to examine two important issues concerning M-S 
diversification: 1) whether the pattern of genetic divergence 
for all SNPs resulted in mutually exclusive clustering of M and 
S, implying the collective evolution of each form across the 
study area and 2) whether patterns of divergence for a subset 
of SNPs in low divergence regions provided any evidence for 
introgression. 

To assess the first question of taxonomic clustering of M 
and S, we constructed genetic distance trees based on all 



400,071 SNPs and a subset of SNPs from high divergence 
regions of the genome. High divergence regions were defined 
as the 188 significantly diverged 50-SNP windows (9,400 
SNPs) common to all four population pairs. (Among this set 
of SNPs from high divergence regions, only ~29% were 
located in centromere-associated "speciation islands" as de- 
limited by White et al. [2010, p. 2284]). We then examined 
the trees to determine whether M and S populations could be 
resolved into two distinct genetic clusters across the sampling 
locations in West and Central Africa. Alternatively, local M and 
S populations may differ significantly, but all M and S popu- 
lations may not form mutually exclusive clusters across the 
study area. 

To examine the second question of gene flow, we note 
that a subset of SNPs has characteristics that make them a 
priori candidates for introgression, one of which is occurrence 
in low divergence regions of the genome. Low divergence 
regions were empirically defined as the 138 50-SNP windows 
(6,900 SNPs) whose mean Contrast value differences fell in 
the bottom 15% of the distributions for all four localities. 
Among these 6,900 SNPs from low divergence regions, the 
subset that represent synonymous substitutions in coding re- 
gions (659 SNPs) are most likely to be neutral and, thus, the 
best candidates for introgression between M and S. 
Consequently, genetic distance trees based on these SNP sub- 
sets provide a means to test for possible gene flow. The pre- 
diction is that if gene flow is ongoing and has occurred at a 
sufficient level to homogenize allele frequencies between local 
M and S populations, then genetic distance trees based on 
these SNP subsets should group M and S populations by geo- 
graphic proximity and not by form. Synonymous SNPs may still 
be indirectly affected by physical linkage to other SNPs experi- 
encing form-related selection. However, this concern is miti- 
gated by the short (~300bp) median genomic distance 
between SNPs assayed on the array (Neafsey et al. 2010) 
and the generally short physical scale (<300bp) of linkage 
disequilibrium in A. gambiae (Harris et al. 2010; Neafsey 
et al. 2010; Weetman et al. 2010). 

Results 

Pattern of Genome-Wide Divergence 

Figure 2 depicts mean Contrast values for nonoverlapping 50 
SNP windows plotted along chromosomes for pooled M 
versus S comparisons at each of the four pairs of collection 
sites across West and Central Africa. At all four sites, the most 
pronounced peaks of M-S genomic divergence were asso- 
ciated with the centromere-proximal regions of chromosomes 
2, 3, and the X, as previously found. However, each plot also 
revealed numerous other windows of significant divergence, 
occurring at irregular intervals along all three A. gambiae 
chromosomes. 

Approximately 300 significantly diverged 50-SNP outlier 
windows were found for each of the four paired M-S 
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Fig. 2. — Genome scan of divergence along chromosome arms between sympatric population samples of M and S. Mean difference in Contrast values 
per nonoverlapping 50-SNP window is plotted: (A) in Mali, for M and S from locality 1 ; (B, Q in Burkina Faso, for M and S from localities 2 and 3, respectively; 
and (D) in Cameroon, for M and S from localities 4 and 5. Vertical gray shading behind chromosome 2 marks the regions spanned by polymorphic inversions 
j, b, c, u, and a. Horizontal dashed line is the threshold for significant divergence. Across the top of panel A, vertical purple lines mark the approximate 
chromosomal position of 50-SNP windows significantly divergent in all four geographic locations. 
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comparisons out of a total of 7,998 windows tested at each 
geographic location. The majority of these outlier windows 
were significant across multiple localities (all four sites = 188/ 
427 total outlier windows, 44%; three of four sites = 70/427, 
16.4%; and two of four sites = 60/427, 14%) implying a 
degree of consistency in the geographic pattern of genomic 
divergence between M and S. This consistency was confirmed 
by Spearman Rank correlation analyses of M-S Contrast value 
differences for windows compared across geographic loca- 
tions. Genome-wide divergence patterns were more similar 
among locations in West Africa (Spearman's r s = 0.74-0.77) 



than for comparisons between West and Central African lo- 
cations (Spearman's r s = 0.65-0.69). However, all correlations 
were highly significant (P< 2.2E-16). 

The geographic consistency of genomic differentiation 
between M and S across locations was further reflected in 
the Neighbor-joining genetic distance tree. Figure 3A shows 
the relationships among M and S populations based on 
mean Contrast value differences across all 400,071 SNPs. 
Populations of M and S formed two discrete and highly sig- 
nificant 100% bootstrap-supported genetic clusters in the 
tree. Thus, whole-genome differentiation distinguished M 
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Fig. 3. — Genetic distance trees for M and S populations based on 
alternative genome partitions: {A) all 400,071 SNPs; (B) the 9,400 SNPs 
from 1 88 windows consistently associated with significant M-S divergence 
across the study area; (O the 6,900 SNPs from 138 windows of consist- 
ently low divergence; and (D) the 659 synonymous SNPs from 138 win- 
dows of consistently low divergence. M and S populations are represented 
in red and blue, respectively, and numbered according to the locality in 
figure 1 where they were collected. All nodes are supported by 100% of 
1,000 bootstrap replicates except the node connecting M4 and S5 in 
panels Cand D, which was supported by 94% and 92% of the bootstrap 
replicates, respectively. 

from S across their range of overlap in West and Central 
Africa, implying that they are evolving collectively across the 
study area (Rieseberg and Burke 2001). Using only the subset 
of 9,400 SNPs for tree construction from significantly differ- 
entiated genomic regions that were consistently associated 
with M-S divergence, the difference between M and S was 
drawn in even starker contrast (fig. 3B). 

Evidence for Introgression 

To address the issue of introgression, we constructed a genetic 
distance tree based on subsets of SNPs from genomic regions 
that consistently ranked in the bottom 15% of divergence 
across all four localities (see Materials and Methods) and com- 
pared this to the tree based on all 400,071 assayed SNPs 



(fig. 3). In contrast to both the whole-genome tree (fig. 3A) 
and the tree based on SNPs in consistently high divergence 
genomic regions (fig. 3B), the genetic distance trees based on 
SNPs from low divergence regions (fig. 30 and synonymous 
SNPs from low divergence regions (fig. 3D) grouped local M 
and S populations together, differentiating populations by 
geographic proximity and not by form, as predicted if they 
have recently been subject to introgression. Further, introgres- 
sion is expected to differentially involve the genetic exchange 
of neutral SNPs rather than those associated with ecological 
adaptation and reproductive isolation. A manifestation of this 
should be decreased genetic distance for neutral SNPs (due 
primarily to drift) compared with divergently selected SNPs. 
This prediction is consistent with the substantially shorter 
branch lengths in figure 3D versus figure 3B, assuming that 
synonymous SNPs in low divergence regions represent mainly 
neutral variants and that high divergence regions are likely to 
contain some divergently selected SNPs. 

Discussion 

Ecological selection is assumed to be the driver of M-S speci- 
ation. In West Africa, the main ecological distinction between 
the presumed ancestral S form and the derived M form of 
A. gambiae (Costantini et al. 2009) is the larval habitat; M 
breeds in irrigated rice fields and thus may have originated 
in association with rice domestication in Africa. African rice 
{Oryza glaberrima) was domesticated from a wild ancestor 
~2, 000-3, 000 years ago in the inland delta floodplains of 
the upper Niger River in Mali, with subsequent spreading 
along Sahelian rivers to two secondary domestication centers, 
one in coastal Gambia, Casamance (Senegal), and Guinea 
Bissau; the other in the Guinea forest between Sierra Leone 
and the western Ivory Coast (Porteres 1976; Li et al. 2011). 
However, aside from speculation about the parallel origin of 
M with African rice, little is actually known of the timing of 
M-S divergence, their geographic origins, biogeographic his- 
tory of contact, or demography (Crawford and Lazzaro 201 0). 
This gap in our knowledge complicates interpretation of the 
population genomic patterns obtained from genome scans of 
M and S. One of the most relevant unknowns in the context of 
speciation genomics concerns whether M and S began diver- 
ging in allopatry and subsequently came into secondary con- 
tact or whether they have been in contact and exchanging 
genes since the initiation of ecological divergence. Neither this 
study nor other studies reporting on admixed individuals 
(Marsden et al. 2011; Weetman et al. 2012) or introgressed 
alleles (Djogbenou et al. 2008; Etang et al. 2009; White et al. 
201 1; Choi and Townson 2012) bear on this question; infer- 
ence of recent introgressive hybridization between M and S 
informs only about current gene exchange and not about its 
history or the geographic context of initial divergence. As 
such, although the genomic distribution of M-S divergence 
can be a useful guide both to the location of genes 
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underpinning adaptive divergence between M and S and 
how divergence is maintained in the face of current gene 
flow, it does not speak to the question of how initial diver- 
gence may have built up in the genome in the presence of 
gene flow. 

When the A. gambiae M and S genomes were sequenced 
from polymorphic and newly established colonies from Mali, 
the ensuing population genomic analysis concluded that M-S 
divergence was widespread beyond isolated "speciation is- 
lands" surrounding the centromeres (Lawniczak et al. 2010). 
Although genetic drift resulting from colonization and labora- 
tory maintenance could have augmented divergence between 
these colonies relative to natural populations of M and S, the 
same general pattern of widespread genomic divergence was 
observed using the 400K SNP genotyping array to map gen- 
omic divergence in natural population samples of M and S 
from the same part of Mali (Neafsey et al. 2010). Two subse- 
quent studies of M and S from other parts of Africa have 
confirmed some consistent M-S divergence outside of centro- 
meric regions, but their resolution was limited by genotyping 
at only -900 SNPs (Weetman et al. 2010, 2012). Our 
genome-wide survey of 400,071 SNPs between paired popu- 
lations of M and S in West and Central Africa confirms the 
geographic consistency of the more widespread nature of 
M-S genomic divergence. As noted elsewhere, widespread 
genomic divergence is not surprising in view of low gene 
densities in the centromere-proximal regions and the multi- 
farious nature of phenotypic differentiation between M and S 
(White et al. 2010; Weetman et al. 2012). Our study also 
confirms that the topology of M-S divergence is not uniform. 
Many regions of elevated mean divergence can be found 
interspersed with lower divergence regions. Given the gener- 
ally low levels of linkage disequilibrium characteristic of the 
A. gambiae genome outside of inversions and centromeric 
regions, and the dense spacing of SNPs on the array, this 
suggests that many loci throughout the genome may be in- 
dependently subject to divergent selection between M and 
S. Moreover, it is likely that M-S divergence is even more 
extensive than we have been able to demonstrate given the 
relatively course resolution of our conservative, window-based 
analysis. Some genes contributing to differential adaptation 
may not be detectable as peaks on such genome scans, par- 
ticularly if the beneficial mutation(s) arose from standing vari- 
ation and/or the effect sizes are very small; more modest 
changes in allele frequency may nonetheless be significant 
for adaptive events in natural populations (Pritchard and Di 
Rienzo 2010). 

Our analysis of genome regions characterized by consist- 
ently low M-S divergence across the study area was aimed to 
assess whether evolutionarily significant levels of introgression 
are occurring between M and S, and thus whether these in- 
cipient species can withstand low levels of introgression with- 
out exhibiting the collapse of differentiation observed in 
Guinea Bissau (Marsden et al. 2011; Weetman et al. 2012). 



Specifically, the clustering of geographically proximate M and 
S populations based on SNPs from low divergence windows 
(fig. 30 and the subset of synonymous SNPs from those win- 
dows (fig. 3D) are consistent with low levels of recurrent gene 
exchange homogenizing neutral variation between local 
populations. Independent support for M-S gene flow previ- 
ously inferred from limited population and/or genomic 
sampling includes apparent backcross progeny detected in 
natural populations (White et al. 2010), allelic introgression 
(Djogbenou et al. 2008; Etang et al. 2009; White et al. 
201 1 ; Choi and Townson 2012), and identification of individ- 
ual mosquitoes of mixed ancestry (Marsden et al. 2011; 
Weetman et al. 2012). Taken together, the weight of evi- 
dence therefore strongly supports introgression. Alternative 
scenarios cannot be completely discounted to explain how 
synonymous SNPs might cluster local M and S populations. 
These scenarios involve different combinations of: 1) whether 
selection or genetic drift acting on shared ancestral poly- 
morphism is the primary cause underlying M-S similarity and 
2) whether the M form arose once in allopatry from S or mul- 
tiple times from different local S populations (fig. 4). However, 
random genetic drift is not expected to cause a repeated pat- 
tern of clustering by geographic location. Moreover, both the 
generally low levels of linkage disequilibrium across much of 
the A. gambiae genome — which should minimize associations 
between focal synonymous SNPs and nearby adaptive vari- 
ants, and the absence of extensive barriers to mosquito mi- 
gration in this part of Africa — which should erase the genetic 
signature distinguishing local M populations, render these al- 
ternatives unlikely. Indeed, both indirect and direct estimates 
of gene flow in A. gambiae M and S suggest high rates of 
intraform migration, shallow population structure, and few 
restrictions to gene flow across the entire African continent 
(Taylor et al. 2001; Lehmann et al. 2003; Tripet et al. 2005). 
Although there are indications of population subdivision 
within M between West and Central Africa (Slotman et al. 
2007; Lee et al. 2009), the potential for rapid exchange of 
beneficial alleles between these locations has been demon- 
strated by the spread of insecticide resistance alleles (Djegbe 
et al. 201 1). All indications therefore point to at least some 
gene flow between M and S sufficient to allow for adaptive 
gene flow and for neutral alleles to be more similar in fre- 
quency between local M and S populations than more geo- 
graphically distant populations. More definitive evidence 
awaits a new wave of genome sequencing of individual mos- 
quitoes, made possible by falling costs and increasing through- 
put of next-generation sequencing technology. Although 
economical at present, a limitation of the pooled sample ap- 
proach used in our current SNP genotyping study is an inability 
to determine multilocus genotypes for individual mosquitoes 
to conduct a comprehensive test for introgression. Genome 
sequencing data from individual mosquitoes will enable an 
estimation of the amount and timing of current and poten- 
tially even long-term gene flow, although methods for the 
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Fig. 4. — Alternative scenarios that could explain the clustering of M and S populations by geographic proximity rather than by form in the genetic 
distance tree based on synonymous SNPs from low divergence regions of the genome (fig. 3D). Note that the geographic directionality of population splitting 
and spreading was arbitrarily chosen and is for illustrative purposes only. 



latter have proven problematic and need further development 
(Strasburg and Rieseberg 201 1). 

Although only S occurs in East Africa, the present-day 
distributions of M and S are broadly and sometimes even 
strictly sympatric across much of West and portions of 
Central Africa (della Torre et al. 2005; Costantini et al. 
2009; Simard et al. 2009). The implication is that there has 
likely been ample opportunity for interform gene flow over a 
prolonged period of time, whether it has occurred continu- 
ously from the onset of ecological divergence or only follow- 
ing secondary contact. Despite compelling evidence for 
introgressive hybridization between M and S, genetic 
exchange in the regions sampled for this study has been in- 
sufficient to reverse genome-wide divergence, whose archi- 
tecture is largely consistent across the study area and reflected 
in M and S populations being distinguishable as separate gen- 
etic clusters in a tree based on all 400,071 SNPs. Although 
differentiation is genome wide, it occurs in a heterogeneous 
pattern whereby regions that are consistently associated with 
M-S divergence are interspersed with regions that appear to 
be introgressing between local M and S populations. Thus, the 
paradox raised by Hahn et al. (2012) remains: how is it pos- 
sible to maintain the strong form-specific association between 
alleles at unlinked markers in the face of recombination and 
introgressive hybridization between M and S? One possibility 
is that sporadic local climatic changes can lead to a temporary 
relaxation of ecologically based postmating barriers, long 
enough to allow substantial introgression and recombination 
before environmental conditions return to normal and again 
impose strong divergent selection, which would purge unsuc- 
cessful allelic combinations. In addition, local climatic condi- 
tions can affect both the relative frequency of M and S 



between locales and their level of sympatry, suggesting that 
the degree of hybridization may fluctuate according to the 
extent of population contact (Kamdem et al. 2012). These 
phenomena may explain not only the occasional outbreaks 
of M-S hybrids where they typically are very rare (Pombi M, 
Ayala D, Simard F, Besansky N, Costantini C, unpublished 
data) but also the unusually high rates of M-S hybridization 
and introgression in westernmost West Africa, if this region is 
experiencing climate and/or environmental change resulting in 
the breakdown of reproductive barriers between M and S. 

Although our study has provided some insight into the 
genomic architecture of M-S divergence, two additional for- 
midable challenges remain for understanding ecological adap- 
tation and speciation in A gambiae. One is the identification 
of phenotypic traits with differential fitness consequences for 
M and S in the field. Some progress has been made in this area 
(Diabate et al. 2005, 2008; Lehmann and Diabate 2008; 
Gimonneau et al. 2010, 2012b), but much more is needed. 
The second is the demonstration that particular alleles of can- 
didate genes actually affect the fitness of the phenotypic 
traits. Both challenges have rarely been met in any system 
(Barrett and Hoekstra 2011). 

In conclusion, the M and S forms of A gambiae now 
appear to represent a later stage of speciation than previously 
appreciated. Much research remains to be done to resolve 
how M and S evolved to reach this state. However, the foun- 
dation exists to build a unified natural history of speciation 
genomics for A gambiae in which the ecological basis for 
reproductive isolation is connected to underlying physiology 
and genetic causative mechanisms, and the consequences of 
these associations for the organization and evolution of pat- 
terns of genome-wide divergence resolved. 
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